Data Ingestion Tutorial

Overview

Welcome to the data ingestion tutorial. In this tutorial, we'll go over steps required to ingest data into EpiData. We'll also query the data and verify that our ingestion process was successful.

Generate Access Token

The first step in ingesting data into EpiData is obtaining an access token for session authentication. At present, EpiData supports GitHub's Personal Access Token.

You can go ahead and create a Personal Access Token by visiting https://github.com/settings/tokens

Modify Ingestion Example

Once an access token is available, you can use it within the data ingestion program by following these steps:

  • Download the Python ingestion example sensor_data_ingest.py available in your Notebook tree view.

  • Update the ACCESS_TOKEN variable (in sensor_data_ingest.py) to the Personal Access Token you created on GitHub's website.

    • ACCESS_TOKEN = '<Personal Access Token>'

  • Modify the default values of the following variables (optional):

    • COMPANY
    • SITE
    • STATION

  • Run Ingestion Example

    The next step is to run the updated example 'sensor_data_ingest.py' by using a Python 2.x interpreter. The example sends data to EpiData server using REST interface. You should see status of each ingestion steps in your standard output.

    You can let the example run and ingest data for a short period of time, and interrupt it by using Ctrl-C command.

    Query and Display Data

    We'll now query the database for the data that was ingested in the previous step. Let's start by running the cell below that imports the required modules.

    
    
    In [ ]:
    #from epidata.context import ec
    from datetime import datetime, timedelta
    import pandas as pd
    

    In the cell below, let's modify the variables COMPANY, SITE and SENSOR to match the data recently ingested, and run the cell to query the data.

    
    
    In [ ]:
    COMPANY = 'EpiData'
    SITE = 'San_Jose'
    STATION = 'WSN-1'
    start_time = datetime.strptime('8/1/2017 00:00:00', '%m/%d/%Y %H:%M:%S')
    stop_time = datetime.strptime('8/31/2017 00:00:00', '%m/%d/%Y %H:%M:%S')
    
    primary_key={"company": COMPANY, "site": SITE, "station": STATION, "sensor": ["Temperature_Probe","Anemometer","RH_Probe"]}
    df = ec.query_measurements_original(primary_key, start_time, stop_time)
    

    Next we'll display the initial few records using df.show() function, and visually verify that the data matches the ingested data.

    
    
    In [ ]:
    df = df.select("company", "site", "station", "ts", "meas_name", "meas_value", "meas_unit")
    df.show(5)
    

    Summary

    Congratulations, you have successfully completed the steps of establishing an authenticated session, ingesting sample data and querying the ingested data. These are the basic steps involved in using EpiData for sensor measurements.